Overview

Dataset Statistics

Number of Variables 6
Number of Rows 500000
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 20
Duplicate Rows (%) 0.0%
Total Size in Memory 76.3 MB
Average Row Size in Memory 160.0 B
Variable Types
  • Numerical: 4
  • Categorical: 2

Dataset Insights

price and quantity have similar distributions Similar Distribution
item_id is skewed Skewed
price is skewed Skewed
quantity is skewed Skewed
timestamp has a high cardinality: 499956 distinct values High Cardinality
category has a high cardinality: 152 distinct values High Cardinality
timestamp has constant length 24 Constant Length
price has 484309 (96.86%) zeros Zeros
quantity has 484309 (96.86%) zeros Zeros

Variables


session_id

numerical

Approximate Distinct Count 477134
Approximate Unique (%) 95.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6000000
Mean 5.6262e+06
Minimum 24
Maximum 11562157
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • session_id is skewed right (γ1 = 0.0581)

Quantile Statistics

Minimum 24
5-th Percentile 523485.75
Q1 2.6873e+06
Median 5.5267e+06
Q3 8.5428e+06
95-th Percentile 1.0947e+07
Maximum 11562157
Range 11562133
IQR 5.8555e+06

Descriptive Statistics

Mean 5.6262e+06
Standard Deviation 3.3623e+06
Variance 1.1305e+13
Sum 2.8131e+12
Skewness 0.05806
Kurtosis -1.2104
Coefficient of Variation 0.5976

timestamp

categorical

Approximate Distinct Count 499956
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 44500000

Length

Mean 24
Standard Deviation 0
Median 24
Minimum 24
Maximum 24

Sample

1st row 2014-04-01T03:01:4...
2nd row 2014-04-01T03:06:0...
3rd row 2014-04-01T03:07:2...
4th row 2014-04-01T03:09:0...
5th row 2014-04-01T03:10:1...

Letter

Count 1000000
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1000000
Dash Punctuation 1000000
Decimal Number 8500000
  • timestamp contains many words: 499956 words
  • timestamp has words of constant length

item_id

numerical

Approximate Distinct Count 23650
Approximate Unique (%) 4.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6000000
Mean 2.1679e+08
Minimum 214507331
Maximum 643078800
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • item_id is skewed right (γ1 = 14.3933)

Quantile Statistics

Minimum 214507331
5-th Percentile 2.1454e+08
Q1 2.1468e+08
Median 2.1483e+08
Q3 2.1485e+08
95-th Percentile 2.1485e+08
Maximum 643078800
Range 428571469
IQR 169498

Descriptive Statistics

Mean 2.1679e+08
Standard Deviation 2.9475e+07
Variance 8.6879e+14
Sum 1.0839e+14
Skewness 14.3933
Kurtosis 205.171
Coefficient of Variation 0.136
  • item_id is not normally distributed (p-value 4.231846719310939e-25)
  • item_id has 2379 outliers

category

categorical

Approximate Distinct Count 152
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 33012200
  • The largest value (0) is over 1.51 times larger than the second largest value (S)

Length

Mean 1.0244
Standard Deviation 0.4239
Median 1
Minimum 1
Maximum 10

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 163584
Lowercase Letter 0
Space Separator 0
Uppercase Letter 163584
Dash Punctuation 0
Decimal Number 348616
  • The top 2 categories (0, S) take over 50.0%
  • The largest value (0) is over 1.51 times larger than the second largest value (s)

price

numerical

Approximate Distinct Count 324
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6000000
Mean 111.6206
Minimum 0
Maximum 198863
Zeros 484309
Zeros (%) 96.9%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 43.3239)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 198863
Range 198863
IQR 0

Descriptive Statistics

Mean 111.6206
Standard Deviation 1479.92
Variance 2.1902e+06
Sum 5.581e+07
Skewness 43.3239
Kurtosis 3366.2621
Coefficient of Variation 13.2585
  • price is not normally distributed (p-value 4.230098081255514e-25)
  • price has 15691 outliers

quantity

numerical

Approximate Distinct Count 20
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 4500000
Mean 0.04433
Minimum 0
Maximum 30
Zeros 484309
Zeros (%) 96.9%
Negatives 0
Negatives (%) 0.0%
  • quantity is skewed right (γ1 = 27.6733)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 30
Range 30
IQR 0

Descriptive Statistics

Mean 0.04433
Standard Deviation 0.3529
Variance 0.1245
Sum 22163
Skewness 27.6733
Kurtosis 1566.7861
Coefficient of Variation 7.9605
  • quantity is not normally distributed (p-value 4.384905469305015e-25)
  • quantity has 15691 outliers

Interactions

Correlations

Missing Values